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Abstract 

Background: Given genetic networks derived from two genomes, it may be difficult to decide if their local 
structures are similar enough in both genomes to infer some ancestral configuration or some conserved functional 
relationships. Gurrent methods all depend on searching for identical substructures. 

Methods: We explore a generalized vertex proximity criterion, and present analytic and probability results for the 
comparison of random lattice networks. 

Results: We apply this criterion to the comparison of the genetic networks of two evolutionarily divergent yeasts, 
Socchoromyces cerevisioe and Schizosoccharomyces pombe, derived using the Synthetic Genetic Array screen. We 
show that the overlapping parts of the networks of the two yeasts share a common structure beyond the shared 
edges. This may be due to their conservation of redundant pathways containing many synthetic lethal pairs of 
genes. 

Conclusions: Detecting the shared generalized adjacency clusters in the genetic networks of the two yeasts show 
that this analytical construct can be a useful tool in probing conserved network structure across divergent 
genomes. 



Introduction 

As two related organisms diverge through evolutionary 
time, functional relationships among genes may alter. 
Some relationships may weaken, others strengthen, 
some may disappear while new ones appear. New genes 
or variants of genes may take on specific functions, 
while other genes may be inactivated or lost. And these 
changes proceed independently in the two evolving spe- 
cies. Even if most changes are local, affecting one or 
two relationships and two or three genes, after a long 
enough period of time the inventory of relationships in 
each of the species may reflect relatively little of the ori- 
ginal pattern in the common ancestor, and may be quite 
different from each other. 
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Given two graphs representing functional genetic net- 
works of two organisms, then, it may be difficult to 
decide if the local structures are similar enough in both 
graphs to infer some ancestral configuration or some 
conserved functional relationships. Current methods all 
depend on searching for identical substructures [1]. We 
have recently explored the notion of generalized adja- 
cency to compare chromosomal gene ordering in two or 
more genomes [2-4] as way of parametrizing the relative 
importance of conserved gene order versus total gene 
content within a cluster. However, this concept is not 
tied to the physical nature of chromosomes; it has a 
graph-theoretical definition based solely on the adja- 
cency of pairs of genes as a consequence their linear 
order along the chromosome. As such it is applicable to 
more general graphs. In this paper we will use general- 
ized adjacency to compare the genetic networks of two 
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species, representing the functional interaction between 
their genes. 

Our work falls in the tradition of situating small world 
networks between regular lattice structures, with their 
dense local connections throughout, and completely 
random graphs with their short characteristic path 
lengths. Small world networks tend to have both proper- 
ties, as discussed by Goldberg and Roth [5]. In the next 
section we define generalized vertex adjacency in a 
graph, and generalized adjacency clusters. Since these 
definitions involve a parameter, we invoke our previous 
work on finding a "natural" value for this parameter, 
and discuss its application to networks. We then sketch 
some analytic results on the distribution of the number 
of generalized adjacencies in the comparison of two ran- 
domly labelled regular lattices, and propose a general 
result for the comparison of two arbitrary graphs on the 
same set of vertices. 

We apply our concepts to the comparison of genetic 
networks of Saccharomyces cerevisiae and Schizosacchar- 
omyces pombe. The networks were obtained using Syn- 
thetic Genetic Array screens for "synthetic lethals" 
among virtually all pairs of genes whose individual inac- 
tivation is not lethal [6-8]. Typically, these pairs are 
organized in two parallel pathways that converge on a 
common endpoint, as illustrated in Figure 1. These 
pathways buffer each other so that the inactivation of 
one or more genes on a single one of the pathways will 
not affect survival, but inactivating at least one gene on 
both pathways is lethal. 

We discover a pattern of local clustering in the edges 
common to both networks beyond what is defined by 
vertex adjacency alone. We suggest this is a conse- 
quence of the synthetic lethals methodology for building 
the networks. 



Methods 

Generalized vertex adjacency 

Let 5' be a gene network with a gene set V = {1,..., n}. 
Two genes g and h are i-adjacenU and the pair {g, h) is 
an i-adjacency, in the gene network S, written in g^}i 
in S, if there are / - 1 genes between them in S along a 
shortest path from one gene to the other. We define 
genes g and h to be j)-adjacenty and the pair {g, h) is 
called an j)-adjacency, in two gene networks S and T, 
if they are /-adjacent in either one of the gene networks 
and y-adjacent in the other. We say g is an i-adjacent 
neighbor of the gene /z in a gene network S, if g and h 
are /-adjacent in S, 

We denote the set of all / adjacencies in a net- 
work M , where 1 < / < 0 For two networks S and T 
with the same vertex set V = {1,..., n}, we define a subset 
of C ^ y to be a (ft y/) generalized adjacency cluster, or 
(ft y/) cluster, if all vertices in the subset C are also the 
whole vertices of a connected component of the graph 

G^^: = (v,(£gn£*)u(4n4)). 

To obtain (ft y/) clusters of two gene networks, S and 
r, the new network G^^ need to be created first. The 
network G^^ can be constructed by connecting two 
genes of gene networks S and T if they are /-adjacent in 
S and y-adjacent in T, where max(/, y) < max (ft y/) and 
min(/, y) < min (ft y/). Figure 2 illustrates how the grid 
networks S and T determine the (1, 2) clusters 
{2,3,4,5, 7,9,10,12,13, 14,15,19,20}, {11, 17,18, 22} and 
{16,21,23,24}. Figures 3 and 4 depict the same process 
for triangular graphs and hexagonal graphs, respectively. 

Weight function 

The definition of generalized adjacency cluster in the 
previous section does not discriminate among pairs of 
(/, y)-adjacent genes as long as / and y are less than some 



Figure 1 Synthetic lethal. Parallel pathways, indicated by arrows, converging at a single gene in a genetic network. Genes represented by dots. 
Synthetic lethal pairs of genes connected by red lines. 
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Figure 2 Square grid. Determination of (1, 2) clusters (or (2, 1) clusters) in square grid graphs and the clusters are {2, 3, 4, 5, 7, 9, 10, 12, 13, 14, 

15, 19, 20}, {11, 17, 18, 22} and {16, 21, 23, 24}. 



cut-off values. However, it seems reasonable to think 
that {i, J) with smaller / and ; should be weighted more 
heavily in defining clusters. To explore this, consider 
two networks S and T with the same vertices. Let Wij be 
the weight on two vertices that are (/, ;)-adjacent, i.e., i- 
adjacent in one of the networks and y-adjacent in the 
other, such that 
1. 0 < coij = coji, i, j G {1, 2,..., n-l} 

3. coi^ J > (Dj,^ I if 

(a) max(/, ;) <max(/<, /) or 

(b) max(/, j) = max(/<, /) and min(/, y) <min(/<, /) 

This is a very general class of weights with reasonable 
monotonicity and total weight conditions. We define the 
dissimilarity between two gene networks S and T as 



d{S, T) = 2P-J2 



1=1 



(1) 



where P is the number of pairs {x, y) that are (1, 1)- 
adjacent in two identical gene networks, riij is the total 
number of pairs {x, y) that are /-adjacent in S and /-adja- 
cent in r. / is the diameter of the network. We have 



argued elsewhere [4] that the "natural" way of finding 
weights is to minimize d and we proved the following 
surprising 



^l+8(fe-l)+l 
2 



. The weight co 



Theorem 1. Let ak = 

that minimizes d(S, T) has 

1^, iff < 0L]e, j < i, 
0, otherwise 
where Ic' is an integer and maximizes the function 



(2) 



ak-l i k-jak{ak-l) 

J2 I] (^^i + ^i^) + I] (^"'^i + J 

i=l 7=1 7=1 



. (3) 



where nij is the number of gene pairs i-adjacent on S 
and j-adjacent on T, 

This suggests that uniform weights are appropriate for 
all (/, adjacencies up to a certain cutoff. Empirical 
work indicates that /c* is of the order of where n is 
the number of vertices in the network and so the cutoff 
would be for / and ; to be less than some value ^ 
E.g., for a network with 100 vertices, it should suffice to 
consider 2- and 3-adjacencies, but 4-adjacencies need 
not be considered. 
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Figure 3 Triangular grid. Determination of (1, 2) clusters (or (2,1) clusters) in triangular grid graphs and the clusters are {1, 2, 3, 4}, {6, 7, 8, 9, 10, 

11, 12, 13, 14, 15, 16, 17, 18, 19, 20} and {21, 22, 23, 24}. 



The expected number of (/, j) adjacencies in two random 
networks 

An essential step in studying gene clusters is to verify 
their significance. Random networks are often used to 
estimate the significance of clusters. In this section, we 
represent some characteristics of the expected number 
of {i, j) adjacencies in two random networks, which can 
then be used in evaluating cluster significance. 

Theorem 2. Let M be a randomly labelled square grid 
network with N vertices. Then the number of i adjacen- 
cies, Wj, in the network M converges in distribution to the 
Poisson with parameter 



E[ni) = 2i + 0 



(4) 



the expected number of i adjacencies in the network M, 
Proof Because M is a random square grid network, we 
can use a coordinate system to represent it. Vertices in 
the network correspond to the points in the plane with 
integer coordinates, x-coordinates being in the range 



1,..., m, y-coordinates being in the range 1,..., where N 
= mn. Without loss of generality, we set m < n. Two 
vertices in the network are /-adjacent if the Li distance 
between them in the integer coordinates is /. 

Let Y^(t(, y)be 1 if vertices u, v are /-adjacent in the 
network M and 0 for otherwise. Then = 
^mC^' Since most vertices have 4/ /-adjacent 
neighbors, we can show that 



P{vXm{u,v) = 1\u) 



+ O ( ^ 1 , (5) 



mn — 1 



(mny 



where the error term is due to edge effects [9]. Since 
N = mn, 



p{r^{u,v) = i\{u,v)) 

=P{v,r^{u,v) = l\u)P{u) 
^N(N- 1) VN3 



(6) 
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where the error term includes the edge effects detailed 
in equation(5). Then 

Eini) = J2nYiiiu,v) = l\{u,v)) 

[u.v) 



E 

{u,v) 



4f 

_N(N- 1) 



N(N- 1) 



_N(N- 1) [n^ 



(7) 



2i + 0 



N 



Therefore, based on the proof of Theorem 2 in [10], 
we can conclude that rii converges in distribution to the 
Poisson with parameter E{ni) the expected number of / 
adjacencies in the network M, 

Theorem 3. For S and T two random square grid net- 
works with the same N vertices, the number of pairs of 
vertices nij that are i-adjacent in S and j-adjacent in T 
converges in distribution to the Poisson with parameter 



E[nij) = 8ij + 0 



(8) 



the expected number of {i, J) adjacencies in networks S 
and T, 



Proof Let /i)be 1 if vertices g h are /-adjacent in 
the random square grid network S and 0 otherwise. 
Similarly, define /i) to be 1 if vertices g h are /-adja- 
cent in the random square grid network T and 0 other- 



wise. Let y^^'^L /i) be 1 if vertices g, h are /-adjacent in 



(S,T) 



S and y-adjacent in T. Otherwise Y^^^^r^^{g,}i) = 0. 
Because of the independence of g h being /-adjacent in 
S and y-adjacent in T, the probability that g and h are (/, 
y) -adjacent in S and T is 

= miiS, h) = l\{g,h).P {Y{{g, h) = l\ [g, h)) (9) 
16y 



N2(N- 1) 



So the expected number of (/, y)-adjacencies in the two 
networks S and T is 



1 



N2(N- ly 



E 1 

l,h) in S,T 



(10) 



The term Xl(^,/i)inS,T ^in equation (10) represents the 
total number of {g, h) combinations in two networks S 
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and T based on pairs of location of (g, h) in S and T, 
There are jN{N — 1) pairs of location possible for {g, 
h) in each of two networks and 2 alternatives for each 
gene pair {g, h) in S and T, So 



^{g,K)mS,T 1 ^ \N^(N — 1)^. Hence, the expected num- 
ber of {i, -adjacencies in the two networks S and T is 



&ij + O 



(11) 



Therefore, based on the proof of Theorem 2 in [10], 
we can conclude that Hij converges in distribution to the 
Poisson with parameter E{nij), the expected number of 
(/, ;) adjacencies in networks S and T, 

More generally we can use the same techniques to 
prove Theorems 4 and 5: 

Theorem 4. Let D be the degree of a gene in the ran- 
dom genetic grid network, i.e. the number of 1- adjacent 
neighbors of this gene in the network. For two random 
genetic lattice networks S and T with same genes, the 
number of pairs of genes nij that are i-adjacent in S and 
j-adjacent in T converges in distribution to the Poisson 
with parameter 



E[nij) 



+ 0 



(12) 



Even for networks as small as 400, simulations indi- 
cate that the distribution of nij is close to the Poisson in 
Theorem 4, for square {D = 4), hexagonal {D = 3), trian- 
gular {D = 6) grids as well as linear networks {D = 2). 
Looking beyond regular networks: 

Theorem 5. Let Dj^{M) be the number of k- adjacent 
gene pairs in the random network M. For two random 
networks S and T with same vertices, the number of 
pairs of vertices n^j that are i-adjacent in S and j-adja- 
cent in T converges in distribution to the Poisson with 
parameter 



E{nij) 



A(S)D,(T) 



+ 0 



(13) 



Results: genetic networks in 5. cerevisiae and 5. 
pombe 

Dixon et al. [8] presented an extraordinary comparison 
of the genetic networks of Saccharomyces cerevisiae and 
Schizosaccharomyces pombe, two rather distant yeast 
genomes. Their results are summarized in their Figure 
2, which we reproduce here as Figure 5. We separated 
the two overlapping networks based on the colours in 
this diagram, as depicted in Figure 6. 

We compiled the graph-theoretical characteristics of 
these networks: number of vertices, average vertex 
degree, number of edges, and present them in Table 1. 
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Figure 5 Overlapping S. cerevisiae and S. pombe genetic networl^s. Green edges: 5. cerevisiae interactions only, blue edges: 5. pombe only. 
Red edges: common to both networks. From [8]. ®2008 PNAS, S. Dixon et al. 
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Figure 6 Separate networks. 5. cerevisioe (left) and 5. pombe (right) genetic networl<s. Red edges are common to botli. 

J 



The details of the vertex degree distributions are given 
in Figure 7. 

We then carried out a number of simulations. First, 
we simulated random networks having the same statisti- 
cal characteristics as in Table 1 and Figure 7. This 
showed the random networks to be deficient in (2,2)-, 
(3,3)- and (4,4) -clusters of genes compared to the yeast 
networks, under all of the (1,1)-, (2,2)- or (3,3)-adja- 
cency criteria (see Table 2). In passing, we mention that 
the analysis of regular grid networks 7 earlier in this 
paper predicts very much smaller numbers of clusters 
than the random networks. Second, we fixed the com- 
mon edges in both yeasts to initialize the random net- 
works, and then generated the rest of the edges in 
conformity with Table 1 and Figure 7. This assured the 

(1.1) -adjacency results would be the same or close to 
the yeast results (see Table 2), but again the yeast net- 
works showed a significant excess of clusters under 

(2.2) -adjacency. (The significance can be verified in Fig- 
ure 8.) 

One of the factors responsible for the increase in clus- 
ters under 2-adjacency is the incidence of parallel 

Table 1 Characteristics of comparative graph 



yeast 


vertices 


average degree 


edges 


5. cerevisioe 


89 


4.36 


194 


S. pombe 


85 


3.34 


142 


in common 


72 




54 



Number of vertices, their average degree and number of edges pertinent to 
each of the two yeasts. 



buffering pathways in the genetic organization of these 
yeasts. Figure 9 illustrates how such pathways determine 
subgraphs in the network that are essentially bipartite. 
There are no 1 -adjacencies among the genes in a single 
pathway, but the back-and-forth pattern of edges 
between the two sides of the bipartite structure ensures 



30 
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1 2 3 4 5 6 7 

degree 

Figure 7 Network characteristics. Distribution of vertex degrees in 
yeast networl<s. 
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Table 2 Characteristics of comparative graph 



cluster size 


graph 


adjacency parameter^ 






1-adj. 


2-adj. 


3-adj. 


2 genes 


yeast 


54 


253 


746 




random 


4.5 


99 


581 




fixed common edges 


53 


217 


767 


3 genes 


yeast 


104 


1,668 


12,321 




random 


1.2 


442 


CO 




fixed common edges 


101 


1,292 


12,185 


4 genes 


yeast 


136 


10,417 


159,167 




random 


0.3 


1,741 


104,339 




fixed common edges 


132 


7,567 


160,821 



Simulations showing heightened numbers of common 2-gene, 3-gene clusters 
4-gene clusters under (2,2)-adjacency in the yeast network compared to 
random graphs. 



that under 2-adj acency, the genes in both pathways par- 
ticipate in clusters of various sizes. 

As for the observation that 3-adjacency does not 
increase the number of clusters over random networks 
more than is achieved by fixing the common edges, this 
is partly explained by the fact that the yeast show only 
about 50% more clusters of each size than the random 
network, compared to the 250% -600% under 2- 



adjacency. Increasing the adjacency parameter in these 
networks simply results in large numbers of random 
clusters that swamp any subtle distinction between the 
fixed edge simulation and the yeast network. 

Conclusions 

Generalized adjacency is a flexible but rigorous concept 
in the search for patterns of similarity among genetic 
networks. Although we analytically calculate properties 
of regular grid networks, e.g., linear, triangular, square 
and hexagonal grids, and though the average vertex 
degree of the empirically derived networks is in the 
same range as the hexagonal and square grids, the pre- 
dicted number of clusters is much higher in the real 
data. This can be attributed in large part to the disper- 
sion of the degree distribution, which is non-existent for 
the grids. 

Of greater interest is the inability of random networks 
with the same characteristics as the real network to gen- 
erate the same number of clusters. This is largely due to 
the small number of common adjacencies in the random 
networks, but even when this is forced to be the same, 
the yeast data showed an unexpected pattern of 
increased clustering under (2, 2) -adjacency, for all sizes 
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Figure 8 Test of cluster frequencies. Number of clusters containing tliree and four 2-adjacent genes in common, in 500 pairs of simulated 
networl<s witli tine same common 1 -adjacencies as tine two yeast networl<s. 
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Figure 9 Effect of synthetic lethality on cluster characteristics. Comparison of a network intersection containing a number of parallel 
buffering pathways with a network without such pathways but with the same number of vertices and edges. 



of cluster (see Table 2). This was partly explicable in the 
way the networks were constructed using the synthetic 
lethals screen. 

In conclusion, generalized adjacency is potentially a 
useful tool in exploring the special combinatorial struc- 
ture of genetic networks. 
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